Progress Memo 2
Final Project
Data Science 1 with R (STAT 301-1)
Introduction
Data Overview
This dataset was acquired through scraping TripAdvisor (TA), a well-known travel website, for its restaurant information. The dataset contains a pool of 1,083,397 restaurants across European countries. There are 42 variables, among these variables, 25 are categorical and 17 are numericals. The raw datasets for Europe’s largest cities were then carefully selected and combined for further examination.
It is important to note that this dataset comprises only those restaurants registered in the TripAdvisor database. Thus, it might not encompass all the restaurants within a city because the dataset relies solely on the TripAdvisor database.
Cleaning the data
In the process of cleaning the data, various essential string manipulation, functions and transformation techniques were employed using the dplyr and stringr packages in R. The dataset underwent a series of refinements to enhance its tidiness and facilitate downstream analyses. Key steps in the cleaning process include:
• Variable Renaming
• Creating and Modifying Variables
• Handling Categorical Data
• Text Processing
• List Manipulation
• Numeric Extraction
• Data Filtering and Handling Missing Values
Starting of EDA
Univariate Analysis
In order to find patterns or unusual trends, I started analyzing at each variable in the dataset.
For Restaurants
Considering there are too many observations, to facilitate my exploration process I decided to look at the 10 most common restaurants.
Through this bar plot, it is possible to see that there are a number of restaurants for the same restaurant name. Thus, looking back at the data I realized that even though those restaurants have the same name, there are all in different cities. Taking as an example, Flunch:
| restaurant_name | city |
|---|---|
| Flunch | Franconville |
| Flunch | Mers-les-Bains |
| Flunch | Villebon-sur-Yvette |
| Flunch | Le Quesnoy |
| Flunch | Strasbourg |
| Flunch | Bordeaux |
| Flunch | Pau |
| Flunch | Clermont-Ferrand |
| Flunch | Tours |
| Flunch | Besancon |
| Flunch | Nantes |
| Flunch | Poitiers |
| Flunch | Avignon |
| Flunch | Antibes |
| Flunch | Roanne City |
| Flunch | Epagny |
| Flunch | Moulins |
| Flunch | Macon |
| Flunch | Montbeliard |
| Flunch | Thionville |
| Flunch | Boulogne-sur-Mer |
| Flunch | Cholet |
| Flunch | Amiens |
| Flunch | Manosque |
| Flunch | Vitrolles |
| Flunch | Le Pontet |
| Flunch | Saint-Jean-de-la-Ruelle |
| Flunch | Herouville-Saint-Clair |
| Flunch | Pertuis |
| Flunch | Noyelles-Godault |
| Flunch | Bonneuil-sur-Marne |
| Flunch | Charleville-Mezieres |
| Flunch | Chambery |
Thus, I realized that those restaurants conformed a chain and that’s why there a more than one of them for those restaurants. Something particular interesting is that all the top restaurants chains are French. The restaurant chain with the highest amount of restaurants is Leon de Bruxelles.
For Average Rating
Through this plot it is seen the European restaurants in those 31 different countries on TripAdvisor, have a high rating, approximately between 4 to 4.8. This could suggest that the average quality offered in European restaurants is really good. This would be deeper study in the multivariate section.
For the Open Days Per Week
In this plot it is possible to see that most of the restaurants are open during the seven days of the week. That is followed by six and five days per week. That makes sense since restaurants should generally be open for five days or more in order to make profit.
However, there are some restaurants that are open for 4 days or less, which is atypical to see. The impact of this low openings amount would be explored in the multivariate section.
For Country
This plot displays the number of restaurants per country. France has the highest number of restaurants in this dataset, which could potentially explain why the top 10 restaurants chain are French. Croatia and Finland are the countries with the least number of restaurants on TripAdvisor. France will be explore deeper in a later section.
For Average Price
This histogram is right-skewed, with a mode around 20 to 30 euros. This could indicate that the majority of European restaurants that appear on TripAdvisor are affordable and generally do not exceed 50 euros. However, there are a some exceptions, which are seen through the outlines in the boxplot with prices ranging from 100 euros to 500 euros.
For Price Level
It is evident in this bar plot that most of the restaurants are mid-range, aligning with what was observed in the average price plot above. This reinforces the idea that the food offered in the majority of the restaurants in this dataset is affordable and potentially budget-friendly.
For special Diets
This plot shows that most of the restaurants on TripAdvisor do not offer special diets in their menus. However, there is some presence of vegetarian options. There is also a possibility that, for some restaurants, it was unknown, so it was registered as if they do not offer special diets. Thus, the impact of special diets can be inaccurate, not meaningful for this EDA.
For Cuisines
In this plot it is seen that most of the restaurants, more than 10000 restaurants, offer a European cuisine. This make sense, since the restaurants I am exploring are located in different European cities.
There is a moderate presence of restaurants, around 1875 ones, that work as bars too. Asian cuisine is also offered by around 1250 restaurants. African and North American cuisines have a lower presence in the menus of the European restaurants. Fusion and South American cuisine are barely offer in those restaurants. Oceania cuisine has the lowest presence in the restaurants within these database.
Multivariate Analysis
Location of the restaurants
Through this latitude vs longitude plot, it is appreciated that most restaurants are located in France. This reinforces the univariate analysis that indicated France having the highest amount of restaurants in the dataset.
Food Top and Bottom Ratings
Since there are a lot observations, the plot will be complicated to read. Thus, to make the analysis more comprehensive, I decided to narrow the observations studied. Since most of the restaurants offer a cuisine in Europe, I decided to explore those restaurants to make the EDA more meaningful.
This filtered dataset will be used to explore the food, service and value ratings in this section.
In this plot it is seen that the top 20 restaurants posses a a food rating of 5 out of 5. This means that the quality of the European cuisine is not only affordable, which was drawn from out previous section analysis, but also really tasteful.
Another interesting finding is that these restaurants with the top food ratings are French, which links with the overall trend of the high performance and presence of restaurants in France.
In this plot, it is appreciated that the most of the restaurants at the bottom, posses a low food rate of 2.0. There is a slightly higher food rate of 2.5 from a restaurant from the Flunch chain. The lowest food rate is 1.5 from Don & Donna.
| restaurant_name | country | food_rating | avg_price | price_level |
|---|---|---|---|---|
| Don & Donna | Greece | 1.5 | 57.5 | expensive |
| Flunch | France | 2.0 | 15.0 | cheap |
| Flunch | France | 2.0 | 15.5 | cheap |
| Flunch | France | 2.5 | 15.5 | cheap |
It is interesting that the Don & Donna restaurant located in Greece, despite a low food rating, their price level is still mark as expensive. While, the French restaurant chain like Flunch, with food rating between 2.5 and 5, their price level is usually cheap, with a price around 15 euros.
Service Top and Bottom Ratings
This plot shows that the restaurants that has the highest service rating 5 out of 5 are French.
This plot shows the restaurants with the lowest service rating. The most common lowest rating is 2.5, followed by 2.0. The lowest service rating belongs to Don & Donna, which also has the lowest food rating as seen previously.
The Flunch chain restaurant appears again, meaning that they do not only have the a low food rating, but also a low service rating.
Value Top and Bottom Ratings
Through this plot, it is seen that the restaurants that has the highest value rating, 5 out of 5, are French.
This plot shows that the restaurants with the lowest value rating is mostly 2. Something particular from this plot is that the Don & Donna restaurant appears again at the bottom.
Price and Rating Relation
In this plot it is evident the relationship between the average rating and price are not directly proportional. This is because not because the restaurant is expensive, it has a high rating. For example, a restaurants with menu within the price range of 390 euros have a rating of 4.5, while another restaurant with a menu around 50 euros have a higher rating of 5. Thus, it is inferred that other factors, such as experience, quality and not only the price matter to the consumers when rating restaurants.
This logic is also reinforced when looking at the most expensive restaurants. Particularly, checking the Brasserie Og Restaurant NO76, a Denmark restaurant, is the most expensive restaurant in this dataset, yet has an average rating of 4.5. The average rating does not reflect a poor image of the restaurant, but it could contribute to why the restaurants is more expensive than others.
Nevertheless, when checking Au Bon Accueil, a French restaurant with an average rating of below 3.75 and average price around 275 euros. It is questionable how a restaurant can charge such a substantial price in the presence of a less than meritorious rating.
Hence, it can be concluded that while price exerts influence on the restaurant’s average rating, and viceversa, there exist additional factors—namely, the ambience, food quality, and service—that significantly shape the overall experience for each patron, thereby influencing the performance of the restaurant at large.
Exploring particular restaurants
Don & Donna
As explored before, Don & Donna restaurant has appeared at the bottom in the food, service and value rating:
| restaurant_name | food_rating | service_rating | value_rating | avg_price | price_level |
|---|---|---|---|---|---|
| Don & Donna | 1.5 | 1.5 | 1 | 57.5 | expensive |
Thus it is possible to infer that the restaurant, Don & Donna is the worst one in this dataset base on the food, service and value rating. Still, it is interesting to see that even though their rating is bad, their prices are still expensive around 50 euros. This lead to think that maybe Greek restaurants are usually expensive regardless of their rating.
The Flush
The Flush, the French chain, has appeared with the highest amount of restaurants, yet it posses the lowest food and service rating as seen previously in the EDA.
| restaurant_name | city | food_rating | service_rating | value_rating | avg_rating | avg_price | price_level |
|---|---|---|---|---|---|---|---|
| Flunch | Mers-les-Bains | 3.5 | 4.0 | 4.0 | 3.5 | 15.5 | cheap |
| Flunch | Poitiers | 3.5 | 3.5 | 4.0 | 3.0 | 15.5 | cheap |
| Flunch | Herouville-Saint-Clair | 3.5 | 3.5 | 4.0 | 3.5 | 15.5 | cheap |
| Flunch | Strasbourg | 3.5 | 3.5 | 3.5 | 3.0 | 16.5 | cheap |
| Flunch | Clermont-Ferrand | 3.5 | 3.5 | 3.5 | 3.0 | 15.5 | cheap |
| Flunch | Charleville-Mezieres | 3.5 | 3.5 | 3.5 | 3.0 | 15.5 | cheap |
| Flunch | Nantes | 3.0 | 3.5 | 3.5 | 3.0 | 15.5 | mid-range |
| Flunch | Roanne City | 3.0 | 3.5 | 3.5 | 3.0 | 15.5 | mid-range |
| Flunch | Moulins | 3.0 | 3.5 | 3.5 | 3.0 | 15.5 | cheap |
| Flunch | Manosque | 3.0 | 3.5 | 3.5 | 3.0 | 15.5 | cheap |
| Flunch | Pertuis | 3.0 | 3.5 | 3.5 | 3.0 | 15.5 | mid-range |
| Flunch | Villebon-sur-Yvette | 3.0 | 3.0 | 3.5 | 2.5 | 15.5 | cheap |
| Flunch | Antibes | 3.0 | 3.0 | 3.5 | 2.5 | 15.5 | cheap |
| Flunch | Macon | 3.0 | 3.0 | 3.5 | 3.0 | 15.5 | cheap |
| Flunch | Boulogne-sur-Mer | 3.0 | 3.0 | 3.5 | 3.0 | 15.5 | cheap |
| Flunch | Cholet | 3.0 | 3.0 | 3.5 | 3.0 | 15.5 | mid-range |
| Flunch | Amiens | 3.0 | 3.0 | 3.5 | 2.5 | 15.5 | cheap |
| Flunch | Chambery | 3.0 | 3.0 | 3.5 | 2.5 | 15.5 | cheap |
| Flunch | Besancon | 3.0 | 3.0 | 3.0 | 3.0 | 15.5 | cheap |
| Flunch | Noyelles-Godault | 3.0 | 3.0 | 3.0 | 3.0 | 15.5 | cheap |
| Flunch | Avignon | 2.5 | 3.0 | 3.5 | 2.5 | 15.5 | cheap |
| Flunch | Franconville | 2.5 | 3.0 | 3.0 | 2.5 | 15.5 | cheap |
| Flunch | Le Quesnoy | 2.5 | 3.0 | 3.0 | 2.5 | 15.5 | cheap |
| Flunch | Tours | 2.5 | 3.0 | 3.0 | 3.0 | 15.5 | cheap |
| Flunch | Thionville | 2.5 | 3.0 | 3.0 | 2.5 | 15.5 | cheap |
| Flunch | Bordeaux | 2.5 | 2.5 | 3.0 | 2.5 | 15.5 | cheap |
| Flunch | Pau | 2.5 | 2.5 | 3.0 | 2.5 | 17.0 | cheap |
| Flunch | Vitrolles | 2.5 | 2.5 | 3.0 | 2.5 | 15.5 | cheap |
| Flunch | Le Pontet | 2.5 | 2.5 | 3.0 | 2.5 | 15.5 | cheap |
| Flunch | Bonneuil-sur-Marne | 2.5 | 2.5 | 3.0 | 2.5 | 15.5 | cheap |
| Flunch | Epagny | 2.5 | 2.5 | 2.5 | 2.0 | 15.5 | cheap |
| Flunch | Saint-Jean-de-la-Ruelle | 2.0 | 2.5 | 3.0 | 2.5 | 15.5 | cheap |
| Flunch | Montbeliard | 2.0 | 2.0 | 2.5 | 2.0 | 15.0 | cheap |
Moreover, when looking at French restaurants with the lowest ratings in food, service and value.
| restaurant_name | food_rating | service_rating | value_rating | avg_rating | avg_price | price_level |
|---|---|---|---|---|---|---|
| Les Chandelles | 2.0 | 2.0 | 2.0 | 2.0 | 18.0 | mid-range |
| La Confiance | 2.0 | 2.0 | 2.0 | 2.0 | 27.0 | mid-range |
| Flunch | 2.0 | 2.0 | 2.5 | 2.0 | 15.0 | cheap |
| Brasserie de l'Evéché | 2.0 | 2.5 | 2.0 | 2.0 | 17.5 | mid-range |
| Le Saint Clair | 2.0 | 2.5 | 2.0 | 2.5 | 18.5 | mid-range |
| Les Comptoirs Casino | 2.0 | 2.5 | 2.0 | 2.0 | 12.5 | cheap |
| Flunch | 2.0 | 2.5 | 3.0 | 2.5 | 15.5 | cheap |
| Mecenate | 2.5 | 2.0 | 2.5 | 2.5 | 22.5 | mid-range |
| A La Maree | 2.5 | 2.5 | 2.0 | 2.0 | 17.0 | mid-range |
| Del Arte Chartres | 2.5 | 2.5 | 2.5 | 2.5 | 21.0 | mid-range |
| Le Molière | 2.5 | 2.5 | 2.5 | 2.0 | 30.0 | mid-range |
| A la maree | 2.5 | 2.5 | 2.5 | 2.5 | 31.0 | mid-range |
| L'Exocet | 2.5 | 2.5 | 2.5 | 2.5 | 29.0 | mid-range |
| Brasserie Les Platanes | 2.5 | 2.5 | 2.5 | 2.5 | 13.5 | mid-range |
| Restaurant Del Arte Annecy | 2.5 | 2.5 | 2.5 | 2.5 | 21.5 | mid-range |
It is clear that the Flunch chain restaurant is not the worst restaurant in France, since that title goes to Les Chandelles and La Confiance.
Nevertheless, considering other French restaurants around the same price average an price level:
| restaurant_name | food_rating | service_rating | value_rating | avg_rating | avg_price | price_level |
|---|---|---|---|---|---|---|
| Bar a Huitres | 4.5 | 4.5 | 5.0 | 4.5 | 15.5 | cheap |
| Le fromage rit | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
| La P'tite Franquette | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
| La Table de Charbon-Blanc | 4.5 | 4.5 | 4.5 | 4.0 | 15.5 | mid-range |
| le O2 verdun | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
| La Cocotte des Halles | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
| Restaurant Plus Belle La Vie | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
| La Table du Malvan | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
| L'Eau Vive | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
| O Patio | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
| Le Goëlic | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
| O Ptit Paradis | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
| Le Tablier | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
| Buron des Bouals | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
| Le Chancel | 4.5 | 4.5 | 4.5 | 4.5 | 15.5 | mid-range |
In the table above, it is seen that costumers can find places where they get good food without spending more, and still get great service. Restaurants like La Table de Charbon-Blanc and Restaurant Plus Belle La Vie prove that French restaurants can be easy on the wallet while giving you a great experience.
Exploring France a bit deeper
Along this EDA, France had a strong presence since most of the restaurants are located there. Also, when looking at prices and ratings the top and bottom restaurants, French restaurants appeared. Thus, I decided to particularly explore at restaurants located in France.
Price level and Ratings
| restaurant_name | avg_rating | price_level |
|---|---|---|
| L'Auberge de la Brie | 5 | expensive |
| The Oystercatcher | 5 | cheap |
| Creperie Ty Gwechall | 5 | mid-range |
| La Maison | 5 | expensive |
| Le Clos de la Prairie | 5 | expensive |
| L'Orée de la Forêt | 5 | expensive |
| Restaurant de La Gare de Percy | 5 | mid-range |
| Bon thé Bonheur | 5 | expensive |
| Le Savignois | 5 | mid-range |
| Restaurant BK | 5 | mid-range |
| Les Chars A Bancs | 5 | mid-range |
| La Cote d'Armor | 5 | mid-range |
| La Goguette | 5 | expensive |
| Don Ulpiano | 5 | mid-range |
| Tea & Ty | 5 | mid-range |
| Copain Copine | 5 | mid-range |
| Restaurant Dolce Vita | 5 | expensive |
| Les Antiquaires | 5 | mid-range |
| AU CREMIER GOURMAND | 5 | mid-range |
| Pause Saveurs | 5 | cheap |
| Le 36 Bonap | 5 | cheap |
| Dolce Italia | 5 | cheap |
| La Terrasse Thalassoleil | 5 | mid-range |
| Influences Sud Ouest | 5 | mid-range |
| LETREIZH Comptoir breton | 5 | cheap |
| Le Neuvieme Art | 5 | expensive |
| Génépi HOTEL | 5 | mid-range |
| Auberge du Grand Megnos | 5 | mid-range |
| Nature Gourmande | 5 | expensive |
| Un p'ti temps K | 5 | cheap |
| Mariottat | 5 | expensive |
| Le P'tit Roseau | 5 | expensive |
| Maison Lameloise | 5 | expensive |
| Aromatique Restaurant | 5 | expensive |
| Bartavelle | 5 | expensive |
| La Trencadis | 5 | expensive |
| La Croissanterie du Lac | 5 | cheap |
| Cote Table | 5 | mid-range |
| Auberge Chez Guth | 5 | expensive |
| Au Faitout | 5 | expensive |
| Restaurant au boeuf rouge | 5 | expensive |
| Auberge de la Tourre | 5 | mid-range |
| Chez Cécile | 5 | mid-range |
| Restaurant Le Very'table | 5 | mid-range |
| Famille Moutier | 5 | mid-range |
| A la Table de Chanelle | 5 | expensive |
| Le Manege Des Saisons | 5 | mid-range |
| Le Pelican | 5 | expensive |
| Ti Blazenn | 5 | expensive |
| Le Mouton Noir | 5 | mid-range |
| Le Vicomte | 5 | mid-range |
| Le Jour Du Poisson | 5 | expensive |
| Auberge des 4 Chemins | 5 | mid-range |
| CoquiThau | 5 | cheap |
| le trou normand | 5 | expensive |
| ARGI-EDER | 5 | expensive |
| Chez Marie | 5 | mid-range |
| Au 14 Fevrier | 5 | expensive |
| Les Sources du Moulin | 5 | mid-range |
| Ti Henri | 5 | mid-range |
| Le Clocher des Pères | 5 | expensive |
Through this bar plot, it is evident that the majority of French restaurants fall into the mid-range category. This suggests that French restaurants offer a variety of services and cuisines that are affordable for consumers. Moreover, their service remains excellent, as indicated in the table, which shows that some top-rated restaurants with an average rating of 5 also belong to the mid-range category.
Additionally, the plot suggests that French restaurants cater to diverse customer budgets. There are upscale, expensive restaurants for special occasions, as well as mid-range and affordable options for casual or informal gatherings. Regardless of the price level customers are seeking, they can still find restaurants with great food, value, and service, as illustrated in the table listing restaurants from cheap to expensive, all with an average rating of 5.
How many french restaurants have an award?
In this pie chart, it is evident that the majority of French restaurants have received awards. This underscores the excellent culinary service that French restaurants offer. These accolades not only enhance their reputation but also contribute to the higher costs associated with some French restaurants. Winning awards can impact not only the customer experience but also the pricing of their menus.